Improvements to seismicity forecasting based on a Bayesian spatio-temporal ETAS model

The epidemic-type aftershock sequence (ETAS) model provides an effective tool for predicting the spatio-temporal evolution of aftershock clustering in short-term. Based on this model, a fully probabilistic procedure was previously proposed by the first two authors for providing spatio-temporal predictions of aftershock occurrence in a prescribed forecasting time interval. This procedure exploited the versatility of the Bayesian inference to adaptively update the forecasts based on the incoming information provided by the ongoing seismic sequence. In this work, this Bayesian procedure is improved: (1) the likelihood function for the sequence has been modified to properly consider the piecewise stationary integration of the seismicity rate; (2) the spatial integral of seismicity rate over the whole aftershock zone is calculated analytically; (3) background seismicity is explicitly considered within the forecasting procedure; (4) an adaptive Markov Chain Monte Carlo simulation procedure is adopted; (5) leveraging the stochastic sequences generated by the procedure in the forecasting interval, the N-test and the S-test are adopted to verify the forecasts. This framework is demonstrated and verified through retrospective early forecasting of seismicity associated with the 2017–2019 Kermanshah seismic sequence activities in western Iran in two distinct phases following the main events with Mw7.3 and Mw6.3, respectively.


SI-1.1. Metropolis-Hastings (MH) algorithm
In order to sample from the posterior distribution p(|seq,M l ), Markov Chain Monte Carlo (MCMC) simulation routine is employed herein. The MCMC routine here employs the Metropolis-Hastings (MH) algorithm. The MH algorithm generates a Markov chain that produces a sequence of samples [ 1 →  2 → ⋯ →  i → ⋯], where  i represents the state of Markov chain at i th iteration (the first few samples are often discarded to reduce the initial transient effect). A Markov chain is a stochastic process where the transition from current state to a new state is done by using a conditional transition function that is conditioned on the current (last) state. Given the Markovian nature of MCMC simulation scheme, and to generate (i+1) th sample i+1 from the current i th sample i based on MH routine, the following procedure is done: (a) Simulate a candidate sample  * from a proposal distribution q(|i). It is important to note that there are no specific restrictions about the choice of q(ꞏ) apart from the fact that it should be possible to calculate both q( i+1 | i ) and q(i|i+1). (b) Calculate the acceptance probability min(1,r), where r is defined as follows (see also θ seq seq θ θ θ θ θ θ θ seq seq θ θ θ θ θ θ      ( S I -1 ) (c) Generate u from a Uniform distribution between (0, 1), i.e., u ~ Uniform (0, 1);  if u ≤ min(1,r) → set n+1= * (accept the candidate state to be taken as the next state of the Markov chain);  else set  n+1 = n (the current state is taken as the next state)

SI-1.2. Applying an adaptive Metropolis-Hastings algorithm (adaptive MCMC)
In order to improve the rate of convergence of the simulation process, we used an adaptive MH algorithm (as proposed in 52 ) that introduces a sequence of intermediate candidate evolutionary PDF's that resemble more and more the target PDF. Let {p 1 , p 2 ,…, p Nchain } be the sequence (chain) of PDF's leading to p(|seq,M l )=p Nchain , where Nchain is the number of chains and each chain contains Nseed samples (as indicated subsequently). The following adaptive simulation-based procedure is employed:  Simulate Nseed samples {1, 2, …, Nseed} (1) , where the superscript (1) denotes the first simulation level (first chain), with the target PDF p1 as the first sequence of samples (i.e., first simulation level, or the first chain). Since we have no idea of choosing a proposal PDF to get MCMC samples, and instead of accepting or rejecting a proposal for  involving all its components simultaneously (called block-wise updating scheme), it might be computationally simpler and more efficient at the first stage to make proposals for individual components of , one at a time (a component-wise updating approach). In the so-called block-wise updating, the proposal distribution has the same dimension as the target distribution. For instance, if the vector of model parameters  involve n uncertain parameters (herein, n can vary from 6 to 8 parameters), we design an n-dimensional proposal distribution, and either accept or reject the candidate state (with all n variables) as a block. The block-wise updating approach can be associated with high rejection rates. This may cause problem when we want to generate the first sequence of samples. Therefore, we utilize the more stable component-wise updating for the first chain. Knowing that the vector of ETAS model parameters  has n=6 to 8 variables (depending on whether we calculate K separately or leave it to MCMC updating, respectively and also on the type of the kernel density of distance), we start from the first variable and generate a candidate state based on a proposal distribution for this individual component, and finally accept or reject it based on MH algorithm. Note that in this stage, we have only varied the current component and kept the other variables in vector  constant. Then, we move to the next components one-by-one and do the same procedure, while considering the previous (updated) components. Therefore, what happens in the current step is conditional on the updated parameters in the previous steps. The proposal distribution for each component is assumed to be a lognormal distribution herein. The prior PDF for each component is a lognormal distribution (as shown in Equation 20 of the manuscript). If the parameter K is being calculated according to method (a) in Section 4.2.2 of the manuscript, this should be done after each realization of other uncertain parameters. The kernel density  (1) constructed in Equation (SI-2) approximates p1. The kernel function  can be viewed as a PDF consisting of bumps at  i , where width w i controls the common size of the bumps. Therefore, a large value of w i tends to over-smooth the kernel density, while a small value may cause noise-shaped bumps. In view of this, the w i can be assumed to have a fixed width (= w), or alternatively the adaptive kernel estimate can be employed (Ang et al. 1992, Au and Beck 1999) that is defined for each sample i, i=1:Nseed. The adaptive kernel has better convergence and smoothing properties over the fixed-width kernel estimate. The fixed width w is estimated as follows (Au and Beck 2002): where Nd is the number of distinct samples (Nd ≤Nseed). For one-dimensional problems, this leads to the wellknown fixed-width value of 4 3 ⁄ ⁄ ⁄ (see Silverman 1986). The reason for using N d is due to the fact that for the next simulation levels, where we are going to use a block-wise updating approach in the MCMC scheme, one may be faced with rejection of candidate states within the Markov chain. Thus, we need to count the distinct samples. In the adaptive kernel method, the idea is to use a larger width in regions of lower probability density. Following the general strategy used in the past (see Ang et al. 1992, Au and Beck 1999), the adaptive band width wi for the i th sample  i can be written as , where the local bandwidth factor  i can be estimated as follows:

Discussion on the estimation of the completeness magnitude Mc
The completeness magnitude is theoretically defined as the lowest magnitude at which 100% of the earthquakes in a space-time volume are detected (Rydelek and Sacks 1989). Following a large earthquake, there exists a detection problem of small aftershocks that cannot be distinguished from the coda wave of the mainshock (i.e. increased noise). So, while our network is capable of detecting small events in normal situation, it could not register those events in the immediate aftermath of a large earthquake. Therefore, first minutes and hours after a large mainshock, small aftershocks are missed, and therefore, we need to increase due to the detection problem. To address the catalogue incompleteness, the completeness magnitude at the beginning of each forecasting interval is checked so that the desired lower magnitude be always greater than or equal to , i.e., . The issue of magnitude incompleteness seems to be more critical when providing early forecasts in the immediate aftermath of a main seismic event. This can be attributed both to the lack of data in the short time elapsed after the main event and the missing data in certain magnitude ranges. Nevertheless, as more time passes and the observation history at the time of forecast starts to become more populated, the magnitude incompleteness seems to be less critical. Herein, we have employed three alternative methods to check/estimate the completeness magnitude that varies through time: 1. Frequency-magnitude distribution plot of the aftershock events in the observation history seq available at the time of issuing the forecast, i.e., [To, Tstart): the normal trend in frequency-magnitude curve has an approximately exponential decrease of N that stands for the number of aftershock with magnitudes equal to or greater than m, , as the observed magnitude m increases (i.e., linearly decreasing in logarithmic scale following a Gutenberg-Richter (GR) relationship. In case that the data is incomplete, a flattening in a certain lower magnitude range (having higher frequencies) can be monitored. Accordingly, is visually identified as the point where the magnitude-frequency curve becomes approximately linear in the semi-logarithmic scale (see e.g., Figure 3a in the section Results and also similar figures in this section).
2. Bayesian updating approach for calculating -value versus various magnitude thresholds: is detected as the magnitude threshold, denoted here as , where the maximum likelihood of the posterior probability distribution of  (mode of the distribution denoted as herein) rendered by the Bayesian updating approach becomes quasi invariant or reaches its peak value with respect to the adopted . This magnitude threshold can be interpreted as where c -1 is the normalizing constant of the Bayes's expression; | , is the likelihood function for data D given and ; | ≅ is the prior probability distribution. The likelihood | , can be seen as the multiplication of | , , which is the conditional probability of observing events with magnitude . Herein, we use a Uniform probability distribution to define the prior indicating non-in formative prior. For instance, Figure 3b (section Results; see also similar figures in this section) illustrates -value calculated as the maximum likelihood (mode) of posterior probability distribution | , , with respect to various magnitude thresholds . It can be seen that increases monotonically with respect to up to a certain value that indicates .
3. The semi-logarithmic plots showing the observed earthquake magnitudes as a function of the time elapsed after the mainshock is a simple way to detect the missed aftershocks with small magnitudes directly after the mainshock. This is a proper visual check to ensure that observed catalog of data is complete for magnitudes greater than that is selected by either of previous two approaches. Therefore, the data in seq (see section 4.2) that will be used for updating the ETAS model parameters does not include small magnitude ranges missed in an early time interval right after the occurrence of the main event. Figure 3c (Section Results; see also similar figures in this section) provides a graphical control to see how well is estimated by the previous two methods so that at the time of starting the forecast, the seq is complete in the range of magnitudes above . The semi-logarithmic plot indicates the presence of voids in the magnitude range right after the occurrence of main event that can be kept away by the selected level of .

Plots for Mc
The graphical representation for finding based on three methods described in Section 4.2.1 of the manuscript for four forecasting intervals right after the occurrence of the Azgeleh event with Mw7.3 is shown in Figure SI

SI-4. Background Seismicity and the maximum expected magnitude Mmax
In this section, we discuss the methods used in order to assess maximum expected earthquake magnitude Mmax (see Section 4.4, Methods, of the original manuscript) and background seismicity rate (denoted as , | in Section 4.2 , Methods, of the original manuscript) for various magnitude lower magnitude thresholds . To this end, two catalogs are combined including: (1) A comprehensive and uniform earthquake catalogue for the Iranian Plateau 14-6.32] before declustering. The standard error in magnitude determination was less than 0.2 for this catalog. This catalog is also declustered by using conjugate windows which is based on Gardner and Knopoff (1974) space-time declustring method. We found 763 clusters of earthquakes with a total number of 4811 (about 53.32%) events out of 9023. By using the EMR approach, we obtained magnitude of completeness 1.5, as shown in Figure Table 1) and the corresponding correlation matrix.    Table 1) and the corresponding correlation matrix.          . Both N-test shows that the forcasts issued in terms of the number of events is not acceptable, while S-test reveals that the spatial distribution of seismicity is predicted very well, i.e., P(S≤ S obs ) is around 50%. With reference to Figure 2, while the number of events with M≥3.0 in this day does not change with respect to the previous day (=19), the number of events with M≥3.3 is increased. This shows a sign of triggered seismicity within this day.

SI-7. Plots for estimating the completeness magnitude for different forecasting intervals in phase 3 of the seismic sequence
The graphical representation for finding based on first two methods described in Section 4.2.1 of the manuscript for various forecasting intervals right after the occurrence of the Sarpol-e Sahab event with M w 6.3 is shown in Figure  SI

SI-9. Correlation structure among posteriors of the ETAS model parameters in Phase 3
Correlation

Discussion on the correlations between pairs of model parameters  based on the 1-year seismicity:
The correlations between the pairs of ETAS parameters, learnt through (around) 1-year seq data as descried above, are reported in the Supplementary Information (Section SI-6-Results; first correlation matrix). Thus, it can be a representative of the correlation structure in a larger time span. It is seen that  has almost no correlation with other ETAS parameters. K reveals a moderate correlation with  and high correlation with temporal parameters (c, p). This observation shows that K in long term is not affected by the spatial parameters. Parameter  has correlation with temporal parameter c (and not with p as was observed previously in early forecasts in Phase 1), and to a higher extent with the spatial parameters (d, q, ). There is a high correlation between the temporal parameters (c, p), and significant negative correlation between the spatial parameters (d, ); however, the high positive correlation between (q, ), as observed in the previous early estimates in Phase 1, does not exist herein.

Discussion on the correlations between pairs of model parameters  in Phase 3:
The correlation structure between the pairs of the ETAS model parameters for the first forecasting interval in Phase 3 [T start =25/11/2018-18:00UTC, T end =26/11/2018-00:00UTC] (Section 2.2.2) are shown in the Supplementary Information (Section SI-9-Results, second correlation matrix). There are only moderate correlations between the temporal parameters (c, p), high negative correlation between spatial parameters (d, ), and moderate positive correlation between (q, ). This shows that the parameter K and  (affecting the number of forecasted events) have low correlations with each other as well as other model parameters, which can be seen as evidence that the seismicity forecast for this time interval was not accepted through N-test and S-test. For the second forecasting interval [T start =25/11/2018-19:00UTC, T end =26/11/2018-00:00UTC] (Section 2.2.3), and the last two forecasting intervals [T start =26/11/2018-00:00UTC, T end =26/11/2018-06:00UTC] and [T start =26/11/2018-06:00UTC, T end =27/11/2018-00:00UTC] (see Section 2.2.4), the correlations between pairs of the ETAS model parameters are shown in Section SI-9-Results (third to fifth correlation matrix). Compared to the previous forecast, it is interesting to note that parameter K reveals correlations with  and the two temporal parameters (c, p); however,  still does not reveal any correlation with other ETAS parameters. This is not in line with the observations in Phase 1, and it may be attributed to the presence of higher background seismicity level employed in current phase. There is also high correlation between (c, p), and between pairs of the three spatial parameters (d, q, ). It is noted that the correlations between (d, q) for different forecasting intervals in Phase 3 are higher compared to those in Phase 1. Again, this may be attributed to the nonhomogeneous background seismicity used in this phase of forecasting. ), can be interpreted as the daily rate of seismicity for . It is noted that the real number of events with 2.7 in the one-day time interval is equal to one (see the dotted blue circle in Figure SI-18a), which is within the confidence interval of the forecasted seismicity. The daily base seismicity estimated from the long-term seismic activity in the aftershock zone for 2.7 (at time old To=01/11/2017-06:00 UTC) is estimated to be . 0.0063 (see Table SI-1). Thus, the background seismicity has increased more than 300 times. On the same page, it is interesting to see the forecasted seismicity through the exceedance daily probability for 6 0.0021, as shown in Figure  SI-18a. We are interested in 6 as the Sarpol-e Zahab event of M w 6.3 took place five days after new T o (see Figure  6). From the calculation of the long-term seismicity level, the estimated daily probability is around 1 exp 6.1239 10 365 ⁄ 1.68 10 (see Table SI-1 where the rate of events from long-term background seismicity with 6 is 6.1239 10 ). Hence, the level of (forecasted) seismicity for probability of occurrence of 6 at the desired date is more than 12 times higher than the base seismicity level, which reveals an alarming level. be traced in Figure 7). It is noted that among the three events, the closest one with Mw3.1 took place less than three hours before the main event (i.e., the main event was preceded by only one foreshock in the same date). The yellowcolored pentagram shows the first aftershock with Mw4.4 that took place right after the Mw6.3 main event. With reference to the seq, a total of 16 triggered aftershocks occurred in 1-hour time interval from 17:00 UTC up to 18:00 UTC (i.e., the time of T start ). The aftershock data also features the triggered event of M w 5.3 at 17:09 UTC.